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Abstract 

We show that the Bernstein-Hoeffding method can be employed to a larger 
class of generalized moments. This class includes the exponential moments 
whose properties play a key role in the proof of a well-known inequality of 
Wassily Hoeffding, for sums of independent and bounded random variables 
whose mean is assumed to be known. As a result we can generalise and im¬ 
prove upon this inequality. We show that Hoeffding's inequality is optimal in 
a broader sense. Our approach allows to obtain "missing" factors in Hoeffd¬ 
ing's inequality whose existence is motivated by the central limit theorem. 

The later result is a rather weaker version of a theorem that is due to Michel 
Talagrand. Using ideas from the theory of Bernstein polynomials, we show 
that the Bernstein-Hoeffding method can be adapted to the case in which one 
has information on higher moments of the random variables. Moreover, we 
consider the performance of the method under additional information on the 
conditional distribution of the random variables and, finally, we show that 
the method reduces to Markov's inequality when employed to non-negative 
and unbounded random variables. 
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1 Introduction 

1.1 Motivation and related work 

For a given real p e (0,1) let B(p) be the set of all [0, l]-valued random variables 
whose mean is equal to p. Formally 


B{jp) {X : 0 < X < 1, E[X] = p}. 

The main motivation behind this work is the following, well-known, problem. 

Problem 1.1. Fix n real numbers pi,...,p n e (0,1) and a real number, t, such that 
S"=i Pi < t < n. Find (or give upper bounds on) 


4>(pi, • ■ ■ i Pn'i t) — SUpP 
X 




i =1 


where the supremum is taken over all random vectors X = (X\ ,..., X n ) of independent 
random variables with X, e B(pi),for i e {1 ,n}. 

If t <Y2i Pi / then the problem is trivial; just choose X t to be equal to p, with prob¬ 
ability 1. There is a vast amount of literature that is related to Problem 11.11 The 
interested reader is invited to take a look at the works of Bentkus 0, |3]|, [0J, Fan 
et al. 0, From liTll . From et al. [jT2| . Gyorfi et al. Iil3ll , Hoeffding fll51 , Kha et al. 
If20ll , Krafft et al. f1~6l . McDiarmid f1~7|. Pinelis l23ll ,|[24 | . Schmidt et al. |28l , Siegel 
113011 . Talagrand |3T| , Xia H32| among many others. 

Determining the function ip(pi,... ,p n ; t), for given pi,... ,p n ,t, turns out to be a 
notorious problem that has been around for many years. To our knowledge, no 
solution to this problem has ever been reported and most of the existing work fo¬ 
cuses towards obtaining upper bounds on the function f(pi,... ,p n ; t) that are as 
tight as possible. 

Probably the first systematic approach that allows one to obtain upper bounds on 
large deviations from the expectation for sums of independent, bounded random 
variables was performed by Hoeffding in Cl5| . Hoeffding's approach is based on 
a method of Bernstein (see |[T5ll , page 14) and from now on will be referred to as 
the Bemstein-Hoeffding method. The Bernstein-Hoeffding method is, briefly, the 
following. 
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Markov's inequality and the assumption that the random variables are indepen¬ 
dent imply that 


P 


. i=l 


< e~ ht Y[E[e hXi ] < e~ ht 

i=\ 


i= 1 


n 

, for all h > 0, 


where the last inequality comes from the arithmetic-geometric means inequality 
By exploiting the fact that the function /(£) = e ht is convex one can show that 

K[e hXi ] < E[e hBi ], 


where Bi is a Bernoulli random variable of mean p % . Hence we conclude that 


P 


. i=l 


< e~ ht {{l -p) + pe h } n , for all h > 0, 


where P = p YH=\ Pi- ^ we minimise the expression in the right hand side of the 
last inequality with respect to h, we find e h = and hence we obtain the fol¬ 
lowing celebrated result of Hoeffding (see Ifl5ll , Theorem 1). 


Theorem 1.2 (Hoeffding, 1963). Let pi,... ,p n be n, given, real numbers from the in¬ 
terval (0, 1). Let also the random variables X 1} ... ,X n be independent and such that 
X, G B(pi), for each i = 1 ,...,n. Set p = Then for any t such that 

np < t < n we have 


P 


^2 x i>t 
. £=1 


< inf {e~ ht (1 -p + pe h ) n }. 


Furthermore, 


inf {e ht (l — p + pe h ) n \ 


( p(n-f) V / (1 -p)n \ 
\t{l-p)J V n-t J 


H(n,p,t). 


The function H(n,p,t ) in the last expression is the so-called Hoeffding bound (or 
Hoeffding function) on tail probabilities for sums of independent, bounded random 
variables. Throughout this paper, we will denote by Ber(g) a Bernoulli random 
variable with mean q and by Bin(n, q) a binomial random variable of parameters 
n and q. If two random variables W, Z have the same distribution we will write 
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W ~ Z. We remark that the Hoeffding bound is sharp, in the sense that the 
Bernoulli random variables Ber (pf attain the bound, i.e.. 


inf e~ ht 

h> 0 



H(n,p,t), 


where B, is a Ber(//,) random variable. The main ideas behind this work are hid¬ 
den in the fact that 

Y[E [e hBi ] =E [e hB ] , 

i 

where B ~ Bin(n,p) is a binomial random variable of parameters n and p = 
n^iLiE [Bi\, and the fact that the function f(x) = e hx . h > 0, is non-negative, 
increasing and convex. In a subsequent section we will show that, while ap¬ 
plying the Bernstein-Hoeffding method, one can replace the exponential func¬ 
tion e hx . h > 0, with any function /(•) having the aforementioned properties. Let 
us mention that Hoeffding considered the tail probability P E” =1 W: > np + nt'], 
where 0 < t' < 1 — p, instead of the tail P E™ =1 X, > t], where np < t < n, thus ob¬ 
taining a bound that looks different from the bound of the previous theorem. The 
reader is invited to verify that the above bound is the same as the bound given by 
formula (2.1) in [13. We choose to work with the tail P ^ because it 

fits better to our goals. A slightly looser but more widely used version of Hoeffd- 
ing's bound is the function exp (—2 n(t/n — p) 2 ), which follows from the fact that 

H(n,p, t ) < exp ^—2 n — p (see Ifl5l . formula (2.3)). 


There exists quite some work dedicated to improving Hoeffding's bound. See for 
example the work of Bentkus 0, Pinelis Il24l , Siegel 11301 and Talagrand f3l| , just 
to name a few references. Let us bring the reader's to attention the following two 
results that are extracted from the papers of Talagrand HlSTI , Theorem 1.2) and 
Bentkus (0, Theorem 1.2). Talagrand's paper focuses on obtaining some "miss¬ 
ing" factors in Hoeffding's inequality whose existence is motivated by the Central 
Limit Theorem (see lf3TH , Section 1). These factors are obtained by combining the 
Bernstein-Hoeffding method together with a technique (i.e. suitable change of 
measure) that is used in the proof of Cramer's theorem on large deviations, yield¬ 
ing the following. 

Theorem 1.3 (Talagrand, 1995). Let pi, ... ,p n be n, given, real numbers from the in¬ 
terval (0,1). Let also the random variables X x ,..., X n be independent and such that 
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Xi e B(pi),for each i — 1,..., n. Set p = T Y^l.=\ Pi- Then, for some absolute constant, 
K, and every real number t such that np + K <t <np + np( 1 — p)/K, we have 


P 




i =1 


< e 


t — np 


K 


• H(n,p,t), 


y/np{l-p) J y/np{l - p) 
where Hfn , p : t) is the Hoeffding bound and Of) is a non-negative function such that 
1 2 , 4 


< 


V / 27r( 1 + x) y/Tnfx + yj x 2 + 4) 


< Qfx) < 


\f2f(?>x + yj x 2 + 8) 


, for x > 0. 


See lf3TH for a proof of this theorem and for a precise definition of the function 
Of). In other words, Talagrand's result improves upon Hoeffding's by inserting 
a "missing" factor of order j in the Hoeffding bound. Notice that Talagrand's 
result holds true for t e [np + K, np + npfl — p) /A'], for some absolute constant K 
whose value does not seem to be known. Talagrand (see I13TI1 , page 692) mentions 
that one can obtain a rather small numerical value for K, but numerical computa¬ 
tions are left to others with the talent for it. One of the purposes of this paper is to 
improve upon Hoeffding's inequality by obtaining "missing" factors with exact 
numerical values for the constants. 

Part of Bentkus' paper performs comparisons between P E™ =1 X, > t] and tails 
of binomial and Poisson random variables. A crucial idea in the results of |3| is 
to compare P Ef=i Xt — with means of particular functions of certain random 
variables. In particular, in the proof of Theorem 1.2 in jBj one can find the follow¬ 
ing result. 


Theorem 1.4 (Bentkus, 2004). Let the random variables X ]: .... X n be independent and 
such that 0 < X t < 1, for each i = 1 Set p = - Y^=i E[JN, ; ] . Then, for any 

positive real, t, such that np < t < n, we have 


P 




2—1 


< inf-E [max{0, B — a }], 


^ Qj 


where B ~ Bin(ri. p). Furthermore, ift is additionally assumed to be a positive integer, 
we have 


P 


^Xi>t 


1=1 


< e ■ P [B > t ], 


where e = 2.718 
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The quantity on the right hand side of the first inequality is estimated in |3|, 
Lemma 4.2. We will see in the forthcoming sections that first statement of Ben- 
tkus' result is optimal in a slightly broader sense, i.e., it is the best bound that can 
be obtained from the inequality 



where / is a non-negative, convex and increasing function. Additionally, we will 
improve upon the constant e of the second statement. 

1.2 Main results 

In this paper we shall be interested in employing the Bernstein-Hoeffding method 
to a larger class of generalized moments. Such approaches have been already per¬ 
formed by Bentkus |[3j, Eaton [[6J, Pinelis ||22]| , ff24l . Nevertheless, we were not able 
to find a systematic study of the classes of functions that are considered in our 
paper. We now proceed by defining a class of functions that is appropriate for 
the Berstein-Hoeffding method. Let us call a function / : [0, +oo) —> [0, +oo) sub- 
multiplicative if f(x + y) < f(x) ■ f{y), for all x,y E [0, + 00 ). We will denote by T sic 
the set of all functions / : [0, + 00 ) —>■ [0, + 00 ) that are sub-multiplicative, increas¬ 
ing and convex. Examples of such functions are e hx , for fixed h > 0, e hx (l + cx ), 
for fixed h, c > 0 and so on. Our first result shows that the Bernstein-Hoeffding 
method can be adjusted to the class T sic . 

Theorem 1.5. Let T slc be defined as above. Let the random variables X 1? ... ,X n be 
independent and such that 0 < A 7 * < 1 , for each i — 1,... ,n. Set p — ^ J2i=i E[-Yj]- 
Then, for any positive real, t, such that np < t < n, we have 



We prove this result in Section |2l Theorem 11.51 can be deduced using the same 
argument as Hoeffding's result. Its proof ought to be somewhere in the literature 
but we were unable to locate a reference. We provide a proof for the sake of com¬ 
pleteness. Additionally, we prove in Section [2] that Hoeffding's bound is the best 
bound that can be obtained using functions from the class X slc . In Section l3Tl we 
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extend Theorem II .51 to an even larger class of moments. More precisely, fix t > 0 
and let J- lc (t) be the set consisting of all convex functions / : [0, oo) — y [0, oo) that 
are increasing in the interval [£, oo). Examples of such functions are x — e\ for fixed 
e < t, max(0, x — e) for fixed e < t, e hx for h > 0 and so on. In Section 13.11 by 
employing ideas from the theory of convex orders, we obtain the following. 


Theorem 1 . 6 . Let the random variables X x ,... ,X n be independent and such that 0 < 
Xi < l, for each i — 1,..., n. Set p = - EpQ]- Then, for any fixed real number, t, 
such that np < t < n, we have 


P 


Y, x i>t 


i =1 


< inf 


f(t) 




where B ~ Bin(n,p ) is a binomial random variable and J r lc (t) is the class of functions 
defined above. 

In Section [331 we show that the functions / G Buft) that minimise j^yE[/(5)] are 
those used in the aforementioned result of Bentkus, i.e.. Theorem 11.41 We then 
choose a particular function f E B lc {t) and obtain a version of Talagrand's result 
having exact numerical constants. More precisely, in Section 13731 we prove the fol¬ 
lowing improvement upon Hoeffding's inequality. 


Theorem 1.7. Let the random variables X x ,... ,X n be independent and such that 0 < 
Xi < 1 , for each i = 1,... ,n. Set p — - EpQ]. Let t be a fixed positive integer 
such that en v < t < n. Then 

ep-p-\-l — 


p 


>t 


i= 1 


< ■ ( H(n , p, t ) - T(n, p, t;h)) + 


1 + h \ 

e h J 


P [B ntP = t] , 


where H(n,p, t ) is the Hoeffding function, B njP is a binomial random variable of parame¬ 
ters n and p, 

t- i 

T ( n , p ,*;/0 = Xy (<_t) P [£"* = *]» 

»=o 

and h is such that e h = z.e., it is the optimal real such that 



inf 

s>0 



with B r^j Bin(n,p). 
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Let us illustate that the bound of the previous result is an improvement upon 
Hoeffding's inequality. Indeed, notice that the bound of the previous Theorem 
is 

^ ' H ( n >P’t) + (l - P [B n , P = t] 

and the later quantity is a convex combination of H(n,p, t ) and P [ D n p = t]. Now 
Hoeffding's Theorem [L2] implies that 

P [. B UiP = t] < P [B UjP > t] < H(n,p, t ) 

and therefore the bound of the previous result is smaller than Hoeffding's. 

In brief, the previous result improves upon Hoeffding's by adding a "missing" 
factor that is equal to fjf < 1. Since e h = it follows that the "missing" factor 
can be written as 


I+* = _E_(2_l) fl+ln l-P t. 
e h 1 —p\t J\ p(n/t — l)J 

On the other hand, Talagrand's result provides a factor that is approximately 

_ y/np(l-p) _^ K 

\/27r( \Jnp( 1 — p) + t — np) y/np( 1 — p) 

Is it unclear how to compare the two factors without knowing the constant K. If 
we assume that K m 0 then elementary, though quite tedious, calculations show 
that Talagrand's bound is sharper than the bound of Theorem ll.7[ Our bound has 
the advantage that it does not involve unknown constants and that it is obtained 
using a rather simple argument. 

Using Theorem [L6] we can also obtain the following, partial, improvement upon 
the second statement of Bentkus' result, i.e.. Theorem II .41 

Theorem 1.8. Let the random variables , X n be independent and such that 0 < 

< 1/ for each i — 1,... ,n. Set p — ^ Yff=\ E[^]. Then, for any fixed positive 
integer, t, such that np < t < n, we have 

< > t], 

~ t-np 1 ~ 1 

where B ~ Bin(n,p) is a binomial random variable. 


P 


n 


i —1 


Xi > t 
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Note that for large t, say t > jyy, the previous result gives an estimate for which 
< 2. However, for values of t that are close to np, the previous result provides 
estimates for which jfrff can be arbitrarily large. 

In Section |4~T1 we generalise the Bernstein-Hoeffding method to sums of bounded, 
independent random variables for which the first m moments are known. More 
precisely, for given real numbers pi,..., p m G (0,1), let B(p i,..., p m ) be the set 
of all [0,1]-valued random variables whose i-th moment equals p i: i = 1,..., nn. 
Formally, 


B(p ly ..., p m ) := {X : 0 < X < 1, E[X] = p u E[X 2 ] = p 2 ..., E[X m ] = p m }. 


Notice that the set may be empty. Note also that, if B(pi,..., p m ) is non-empty 
then we have pi > fio. > ■ ■ ■ > p m . Recall the definition of the class T sic , defined 
above. The main result of Section [4~T1 is the following. 

Theorem 1.9. Fix positive integers, n, m > 2 and for i = 1,... ,n let {A% }yLi be a 
sequence of real numbers such that 1 > pn > ■ ■ ■ > Pi m > 0 and for which the class 
B(pn, ..., pi m ) is non-empty. Let Xi, ..., X n be independent random variables such that 
Xi G B(pn, ..., Pim), for i — 1,..., n, and fix t G [0, n\. Then 



where T nm is the random variable that takes on values in the set {—, — and, for 

uni k rn' m' ' m J ' J 

j = 0,1 ,,m, it satisfies 



To our knowledge, this is the first result that considers the performance of the 
method under additional information on higher moments. Notice that the prob¬ 
ability distribution of the random variable T nm does not depend on the random 
variables X 1 ,..., X n . Indeed, using the binomial formula, it is easy to see that 





k =0 


and so T nm is uniquely determined by the given sequences on moments {p, 3 } l%) . 
We will refer to the random variable that takes values on the set |0, —} with 

l 7 m 1 ' m J 
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probability (J)E [Xj (1 - Xf 171 j ], for X, : 6 B(p iU ..., p im ), as a B{p iU • • •, iHm)- 
Bernstein random variable. Let us also mention that Bernstein random variables oc¬ 
cur in the study of the so-called Hausdorff moment problem (see Feller liTOj ). A simi¬ 
lar result holds true for the class Xidt ); we state this result in Section l4dl and sketch 
its proof. In Section 14721 we perform comparisons between P X, > t] and bi¬ 
nomial tails that depend on the additional information on the moments. In Section 
[5] we study the performance of the method on a certain class of bounded random 
variables that contain additional information on conditional means and/or con¬ 
ditional distributions. We find random variables that are larger, in the sense of 
convex order, than any random variable from this class and prove similar results 
as above that take into account the additional information. Our approach is based 
on the notion of mixtures of random variables. Additionally, we construct random 
variables £ PiC . that are different from Bernstein random variables and are larger, in 
the sense of convex order, than any random variable from the class B(p, a 2 ), con¬ 
sisting of all random variables in B(p) whose variance is a 2 . In particular, in Sec¬ 
tion |5] we prove the following. 


Theorem 1.10. Fix positive integer n and assume that, for i = 1 ,,n, we are given a 
pair (pi , of) for which the class B(pi, of) is non-empty. Let X u ..., X n be independent 
random variables such that X f e B(pi, a 2 ), for i = 1,..., n. Set p = f , E[Xj] and fix 
t such that np < t < n. Then 


P 


^ x i>i 

. i=l 


< inf Try- E 



where, for i = 1 ,... ,n the random variable £ Pi>cri is given by Lemma 15.51 Furthermore, 
the infimum on the right hand side is attained by a function of the form max{0, x — s}, 
for some e e [0, t). 


Finally, in Section |6] we show that the Bernstein-Hoeffding method reduces to 
Markov's inequality when employed to non-negative and unbounded random vari¬ 
ables. 


2 Sub-multiplicative order 

This subsection is devoted to the proof of Theorem 11,51 The proof will make use 
of the following elementary lemma, that is interesting on its own. 
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Lemma 2.1. Fix real numbers a, b such that a <b. Let X be a random variable that takes 
values on the interval [a, b] and is such that E[X] = p. Let B be the random variable that 
takes on the values a and b with probabilities and E ^j~ a , respectively. Then for any 
convex function, f : [a,b] —* M, we have 

E[f(X)}<E[f(B)]. 

Proof Given X, we couple the random variables by setting B x to be either equal 
to a with probability or equal to b with probability It is easy to see that 

E[B x \X] = X and so 

E[B x }=E[E[B x \X}} = E[X}=p. 

Jensen's inequality now implies that 

E[/(X)] = E[/(E[B x |X])] < E[f(B x \X)} = E[/(B x )], 

as required. □ 


We are now ready to prove our first main result. 


Proof of Theorem lL5l Set S n := X, and fix a function / G T sic . By Markov's 
inequality, independence and the assumption that / is increasing and submulti- 
plicative, we conclude that 


p [Sn > t] < P [f(S n ) > f(t )] (/ is increasing) 
1 


< 


< 


< 


1 

W) 

1 


E 


E 


/ E* 

\i =1 

n 

I/(X) 


Z=1 


(Markov's inequality) 
(sub-multiplicativity) 


m q 


Je[/(X,)] (independence). 


Since the function f(x) is convex, Lemma IZT1 implies that 


E [f(Xi)] < E [f(Bi)], where Bi ~ Ber (pf. 


Hence 


P[^n >t]< 


1 

W) 


n*vwi. 

i= 1 
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Now the arithmetic-geometric means inequality yields 


n E i/( B .)]s 

i=l 



2—1 


n 


{(i-p)/(o)+p/(i)r 


and thus 

^[S n >t}< {(1 - p)f( 0) + pf( 1)}” , / E T t 

The result follows. 


□ 


The first statement in Hoeffding's result (Theorem 11.21 ) is obtained by adjusting 
the previous proof to the function f(t ) = e ht , h > 0, which clearly belongs to T sic . 
For / G T slc , let 

Vn(f,t) :=W{(l-p)/(0)+p/(l)}". 

Notice that Theorem 11.51 suggests that there may be some space for improve¬ 
ments upon Hoeffding's bound, i.e., there may exist a function 0 G T slc such that 
14(0, t) < V n (e hx , t) for all h. We now show that this is not the case, when t is an 
integer. The following result solves the problem of finding inf/ e j- sic 14 (/, t) in case 
t is a positive integer. 


Proposition 2.2. Let t be a positive integer. Suppose that f e T is such that V n (/, t ) < 
V n (f',t) for all f G T. Then there exists g G J such that V n (g,t) = V n (f,t) and 
g(t ) = e ht ,for some positive constant h. 

Proof. Since /(•) is sub-multiplicative and non-negative, it easy to see that /(0) > 
1. For x > 0, set g(x) = f(l) x . Then (?(0) = 1, g( 1) = /(1) and so 

{(1 - p)g{ 0) + pg( 1)}" < {(1 - p)f( 0) + pf{ l)f • 

Since t is a positive integer, it follows that f{t) < /(l) 4 = g{t). Hence V n (g,t) < 
14 (/, 4- The result follows upon setting /i = In/(l). □ 

Hence, for integer t, we have inf /• 14(4, 4 — H(n, p, t ), where the infimum is taken 
over all functions / G and II (n. p. t ) is the Hoeffding function that is defined 
in the introduction. Quoting Hoeffding (see tfl5l , page 15), the bound 


P 


>t 

. i=l 


< H(n,p,t) 
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is the best that can be obtained from the inequality 


P 


. £=1 


< e~ ht Y[E[e hBi ], h> 0, 
2=1 


where Bi ~ Ber(E[W]). This follows from the fact that H(n,p,t) is obtained by 
minimising the expression on the right hand side of the above inequality with 
respect to h > 0. Proposition 12.21 shows that, in case t is a positive integer, Ho- 
effding's bound is the best the can be obtained in a slightly broader sense, i.e., 
H(n , p, t ) is the best bound on P E "=i — A that can be obtained by minimising 
jpy niLi with respect to f e T sic . 


3 Convex increasing order 

3.1 Proof of Theorem 11.61 

In this section we prove Theorem 11.61 and show that the Hoeffding bound can be 
improved using a larger class of functions, namely the class T ic (t), defined in the 
introduction. Once again. Theorem 11.61 implies that there may be some space for 
improvement upon Hoeffding's bound. We will employ this result and en route 
find a function f e J- lc (t) such that 

W) mB}] < 

where B ~ Binfn. p). Hence there is indeed space for improvement upon Hoeffd¬ 
ing's bound. The proof of Theorem II.61 will require some well-known results and 
the following notion of ordering between random variables (see |29j ). 

Definition 1. Let X and Y be two random variables such that 

E[f(X)} < E [f(Y)], for all convex functions f : R —> M, 

provided the expectations exist. Then X is said to be smaller than Y in the convex order, 
denoted X < cx Y. 

The following two lemmas are well-known (see Theorems 3.A12 and 3.A37 in 
1291 and Theorem 4 in HT5l l. The first one shows that convex order is closed under 
convolutions. 
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Lemma 3.1. Let Xi,... ,X n be a set of independent random variables and let Yi,..., Y n 
be another set of independent random variables. IfX t < cx Y ir for i — 1,..., n, then 

n n 

Y X * < cx 
2—1 2—1 

The second lemma shows that a sum of independent Bernoulli random variables 
is dominated, in the sense of convex order, by a certain binomial random variable. 


Lemma 3.2. Fix n real numbers pi,... ,p n from (0,1). Let ,... ,B n be independent 
Bernoidli random variables with B t ~ Ber(pi). Then 

n 

Y Bi < cx B , 

2—1 

where B ~ Bin{n,p ) is a binomial random variable of parameters n and p := ^ 'ff i Pi- 
The proof of Theorem ll.6l is basically an extension of the proof of Theorem 11.51 


Proof of Theorem 11.61 Fix / e J r i C (t). Since /(•) is non-negative and increasing in 
[t, oo), Markov's inequality implies that 




2 — 1 


< 


m 


E 


f Y x - 


, 2—1 


Since /(•) is convex. Lemmata 12.11 and 13.11 imply that 


E 


/ 


, 2 — 1 


< E 


f[E B < 


, 2—1 


where Bj ~ Ber(E[Xj]), i = 1 ,,n. Now Lemma 13721 yields 


E 


/ 


, 2—1 


<E[/(S)1 


and the result follows. 


□ 


Similar ideas as above have been employed to sums of independent Bernoulli 
random variables by Leon and Perron in Kl9l . In a subsequent section we em¬ 
ploy Theorem [T76] and en route improve upon Hoeffding's inequality by inserting 
certain "missing" factors. Before doing so, we state some results regarding the 
optimal function in the class Ji c (f). 
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3.2 Optimal functions in Fi C (t) 

Let the random variables Xi,..., X n be independent and such that 0 < X t < 1, 
for each i = 1,... ,n. Set p = L Y^i=i E[-Yj] and fix a real number, f, such that 
np < t < n. We have shown in the previous section that, for / £ X ic (t), we 
have 



where B ~ Bin(n,p). Set 



In this section we solve the problem of finding inf /■ T n (/, f), where the infimum is 
taken over all functions / £ X ic (t). We show that the solution is related to Ben- 
tkus' result. We begin with an observation on the optimal function. 

Lemma 3.3. Let 0 £ X lc (t) be a function such that T n {f,t) = inf fT n (f,t), where the 
infimum is taken over all functions f e X ic {t). Then we may assume that f(t) = 1. 

Proof. If fit) f 1, then we set fi(x) = 4>{x),x >0. □ 

Using this result we can find functions / £ X ic {t) that minimise T n (f, t). 

Theorem 3.4. Let f £ Tjjt) be such that T n (4>,t) = inf fT n (f,t), where the infimum 
is taken over all functions f £ Xidf). Then f{x) equals max{0, -U • (x — e)},for some 
s £ [0, t). 

Proof. We may assume that 0(f) = 1 and so 0 is such that 


n 


inf /(*) ' = j ]= T n(0, t), 


2=1 


where the infimum is taken over the set Z ic {t), containing all functions / £ X lc {t) 
such that /(f) = 1 . Let m t := iriin{n £ N : f < n} be the smallest positive integer 
that is larger than f. Note that, by definition, 0 < m t — t < 1. For x > 0, define the 
function 
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In other words, </>*(•) equals zero for x < e and for x > e it is a straight line starting 
from point (0,e) G M 2 and passing through the points ( t,f(t )) and (m t , 0(m t )). 
Note that e < t and that e > 0; indeed, if e < 0, then 0(0) > 0 and the function 
0i (x) = _)_ 0(o) would be such that ^ + 0(0) = E[0 1 ( J B)] < E[0(H)], 

which implies that E [4>(B)] is even worse than the bound obtained by Markov's 
inequality, hence contradicts its optimality Since the function 0(-) is convex it 
follows that for every integer k in the interval [0, n] we have 0*(fc) < 0(fc) and this, 
in turn, implies that 

r n (0*,f) < T n (0,t), 

as required. □ 

This yields the following. 

Corollary 3.5. Let the parameters n,p, t be as in Theorem U76l Then for any t e (np, n) 
we have 

in f 77TY E [/( 5 )] = inf E [max{0, B - a}], 

/ J (t) a<t t — a 

where B ~ Bin(n,p) and the infimum on the left hand side is taken over all functions 

f e 

Notice that we can write the function p e (x) := max{0, ■ (x — e)}, for £ e [0, t), in 
the form g h (x) := max{0, h ■ (x — t) + 1}, where h = jL~, and that this correspon¬ 
dence is injective. Notice also that, since £ > 0, we have h > 1/t. The following 
question arises naturally from Corollary 13.51 

Question 3.6. What is the optimal e such that 

inf -E [max{0, B — a}] = E [p e (B)\ ? 


We remark that such an £ will satisfy £ < \t] — 1, where \t~\ := minjfc G N : t < k}. 
To see this notice that if e > |Y| — 1, then p e (\t\ — 1) = 0 and we may decrease 
£, until it reaches the point \t] — 1, without increasing the value E [p e (B)\. Since 
£ < (0 — 1 it follows that h < f _^ +1 . Now, finding the optimal £ is equivalent 
to finding the optimal h. We are not able to find this h. Nevertheless, due to the 
following result, one can easily find h using, say, a binary search algotithm. 
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Proposition 3.7. Let the parameters n,p , t be as in Theorem lL6l Let h > 0 be such that 
E [max{0, h ■ (B — t) + 1}] = inf E [max{0, s ■ (B — t) + 1}] , 


s>0 


where B ~ Bin(n,p). Then we may assume that h = tU, for some positive integer 


Proof. Recall that h e 


i i 


t ’ t+l-[Y| 


. We have 


i=0 VV 


The function E(h ) := E(.£?)] is linear on the interval 


, for every 


t—j ’ t—j—i 

j e {0,1,..., [f] — 1}. Hence the function (//) is continuous and piecewise lin- 

1 1 and this implies that it attains its minimum at the 


ear on the interval 
endpoints of 


1 1 , for some j e {0,1,..., \t] — 1}. The result follows. □ 


_t-ji t-j-i 

In the next section we obtain an improvement upon Hoeffding's bound. 


3.3 An improvement upon Hoeffding's bound 

In this section we collect results that can be obtained by employing Theorem IT .61 
We begin with the proof of Theorem II .71 

Proof of Theorem ITT71 Given h > 0 define the function f(x) = max{0, h{x — t) + 1}, 
for x > 0. It is easy to see that / e Tift). Let m t be the largest positive integer for 
which firrit) = 0. Using Theorem 11.61 and the inequality e x > 1 + x, for x € E, we 
estimate 


P 


. i=l 


< E[/(B)] 


(h(i —t) + 1)P[Z? = i) 

i=mt +1 


< eh(i ~ t)F i B = A 

i=mt +1 

< H(n,p,t), 
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which shows that E[/(£>)] is strictly smaller than Hoeffding's bound. Since we 
assume that t > epn ,. it follows that h > 1 which in turn implies, since t is an 
integer, that f(i) = 0, for alii e {0,1,..., t — 1}. Hence we can write 


i=t+l 


H(n,p , t) - E[/(£)] = e h{i ~ t] ¥[ B = *1 “ (M* ~ *) + = *1 

i =0 
t -1 

= J^e fc ( <_t) P [B 

2=0 
n 

+ ~ (M* - 0 + !)) = *]■ 


= h 


i=t+l 


For i > t + 1, we have 


,*«-<>_ (ft(i _*) +1) = ( 1 - 1 + e ^„ 0 ) 

1 + h 


> 1 


3 /l 


h(i—t ) 


which implies that 

if(n,p,t)-E[/(£)] > (1- 




1 + /A 

e h J 

1 + h 


= i\ 


i =0 


P = f] . 


The result follows. 


□ 


If t is not an integer, then one may use the previous bound with t replaced by 

|_tj := max{fc e N : k < t} since 


P 


Y, x i>t 


2=1 


< P 


E A '< S [ij 


2=1 


This result improves upon Hoeffding's bound by fitting a "missing" factor that 
is equal to pjr < 1. Theorem 11.61 allows to perform comparisons with binomial 
tails. 
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Proof of Theorem 11.81 Let f>{x) = max{0, x — t + 1} so that f>(t — 1) = 0 and f(-) G 
Tift). Theorem II.61 implies that 


P 




i— 1 


< e [ m)}, 


where B ~ Bin(n,p). Since t is a positive integer, we can write 


E[f(B)} = - t + 1)P[£ = i] = ^2r[B >i}. 

i=t i=t 

Now we use the following, well-known, estimate on binomial tails (see Feller (9j, 
page 151, formula (3.4)): 

F\B > i] < - —— • P[£> = i], for i > np. 
i — np 

Therefore, 

71 ■ t — t 

mB)] ^ £ frf ■ p l B = ‘] ^ ■ p i B s *]- 

as required. □ 


Compare this result with the second statement of Bentkus' Theorem ll.4[ from Sec¬ 
tion 11.11 Note that for large t, say t > |^|, the previous result gives an estimate 
for which <2. In a subsequent section we will show an extension of this 

t—np 1 

result. 


4 The Bemstein-Hoeffding method 


4.1 Proof of Theorem 11.91 


We begin this section with the proof of Theorem ll.9l The proof borrows ideas from 
the theory of Bernstein polynomials (see Phillips fZQ, Chapter 7). Recall that, for 
a function / : [0,1] —* M, the Bernstein polynomial corresponding to / is defined 
as 


) = ^2 

3=0 



x) n J f (j/m), 


19 

















On the Bemstein-Hoeffding method 


C. Pelekis, J. Ramon, Y. Wang 


for each positive integer m. The following is a folklore result regarding Bernstein 
polynomials. 

Lemma 4.1. If f : [0,1] —)► [0, oo) is convex, then 

f(x) < B m (f,x ), for all x e [0,1]. 

If f : [0,1] —> [0, oo) is continuous, then 

sup \f(x)—B m (f,x)\—^0,asm—^oo. 
xe[o,i] 

Proof. See lETH Theorems 7.1.5 and 7.1.8. We remark that the first statement is easy 
to prove and the second arose from Bernstein's search for a proof of Weierstrass' 
theorem. □ 


Proof of Theorem 17791 Let / e T s i c . Since / is non-negative, increasing and sub- 
multiplicative, Markov's inequality implies that 


P 


. £=1 




i 

- m 


n e [/ (^)i 

i— 1 



i n 



where the last estimate comes from the arithmetic-geometric means inequality. 
Since / is convex and X z e [0,1], Lemma l4Tl implies that 


E[/(X,)] <E[B m (f,X t )]. 


Now note that 

m , v 

E [B m (/, X,)} = J2 ( ' • E [X’(l - X,) m -i] ■ 

j—o V :l ' 

For j — 0,1..., m let 
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Notice also that 


m-j , _ .x 

e [xi (i - xj™-’] = v m 3 (-1 

fc=0 ' ' 

which implies that E [X 3 (1 — is the same for all random variables from the 

clsss B(ni t i,..., pi, m ). It is easy to verify that 71 j = 1; hence ^Tj,j = 0,1,..., m 
is a probability distribution on {0,1,..., m}. Now, if we define the random vari¬ 
able T nrn that takes on the value E with probability nj, j — 0,1,..., m, we have 

775 { 5 E -V)]} = 75) {E 1 /wr. 

as required. □ 


Note that for m = 2 the previous result reduces to Theorem II .51 from Section |2l In 
particular, we conclude the following generalisation of Hoeffding's result. 


Corollary 4.2. With the same assumptions as in Theorem lhSi we have 


P 


E x < 


> t 


i— 1 


< inf e~ ht 

h>0 


E 

3=0 


7Tj6 m , 


where TTj := 1 Eti (?) • E [ x !0 - 

Since B m (f, •) converges uniformly to /(•), as m —> oo, we conclude that E[/(T nm )] 
can be arbitrarily close to ^ ffJi=\ E[/(X,)], provided that m is sufficiently large. 
Recall the definition of the class J-i C (t ) from the introduction. 


Theorem 4.3. Fix positive integers, n,m > 2 and for i = 1 let {Pij}f = x he 

a sequence of reals such that 1 > pn > ■ ■ ■ > Pi m > 0 and for which the class 
B(pn, ..., pim ) is non-empty. Let Xi, ..., X n be independent random variables such 
that Xi e B(pi i, ..., pim), for i = 1,..., n, and fix t e [0, n). Then 


P 


>t 


i =1 


< inf —— 

feTiPt) f(t) 


E [f(Znm)\, 


where Z nm = Y^i=i %i an independent sum of random variables Z, such that 


F l z i = j/m] 



■E[xUl-Xi rpjorj 


0,1,... ,m. 
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Moreover, 

inf 7777 E [ f(Z nm )] = inf f— E[max{0, Z nm - a}]. 

J \t) a<t t a 

Proof. The argument proceeds along the same lines as the proofs of the results in 
Section [3d] and Section [3721 and so we only sketch it. Part of the proof of Theorem 
EH yields X t < cx Z ir i.e., E [/(2Q)] < E [B m (f,Xi)\ = E[/(Z i )] / for convex /. Since 
the convex order is closed under convolutions, the first statement follows. The 
proof of the second statement is almost identical to the proof of Theorem 13.41 □ 

In the previous result we found a random variable Z, such that X, < cx Z u for 
every X t e B(nn, ..., Him)- Note that E[Z*] = E[W], for all i — 1,..., n. However, 
higher moments of Z % are not equal to the higher moments of X,. Let us illustrate 
this by assuming from now on that m — 2. Then E \Z\\ = |E[Xj] + |E[JC 2 ] > E[X 2 ] 
and so Z, may not belong to B(pn, Pi 2 ). Notice that this is not the case when rn — 1; 
i.e., when we consider random variables X % e B{pn). In this case (see Theorem 
11.61) we were able to find Bernoulli random variables Bi from the class B{pn) such 
that E [f(Xi)] < E \f(Bi)], for all functions / <G J r ir (t). The following question 
arises naturally from the above. 

Question 4.4. Fix pi, /i 2 G (0, 1 ) such that the class B(n 1 , 1 ^ 2 ) Is non-empty. Does there 
exist random variable £ e B(pi, p 2 ) such that E [f(X)] < E [/(0]// or a ^ X e B(pi, p 2 ) 
and all increasing and convex functions f : [0, 00 ) —y [0, 00 )? 

It turns out that the answer to the question is no. In order to convince the reader 
we will use Lemma [4.51 below, taken from Cohen et al. 0. Let us first fix some 
notation. If X e B(pi,p 2 ), let a 2 := p 2 — p\ be its variance. Set A = pi — 
and and let C be the random variable that takes on the values A and 1 with prob¬ 
ability and respectively. It is easy to verify that C has mean p x and 
variance a 2 . The following result is proven in Cohen et al. 0 and implies that C 
has the maximum moments of any order, among all random variables in B(pi, p 2 ). 

Lemma 4.5. Let X e B(pi, p 2 ) and let C be the random variable defined above. Then 
E [ X k ] < E [C k ],for every non-negative integer k. 

Proof. See 0, Lemma 1.4.1. □ 

The following is an immediate consequence of the previous lemma. 
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Corollary 4.6. Let X e B(yi, p 2 ). If X is not the random variable C of Lemma \4.5\ then 
E [e hX ] < E [e hC ],for any h > 0. 

Proof. Note that the inequality in the conclusion is strict. The previous lemma 
implies that E [Xf < E [C' k ] , for every non-negative integer k. Since X is not 
equal to C, there is at least one k 0 such that E [X k °] < E [C k °]; this follows from 
the fact that the sequence of moments uniquely determines that random variable 
(see Feller IflOl , Chapter VII.3). Therefore, Taylor expansion implies that E [e hX ] < 
E [e hC ]. □ 

The following result implies that the previous question has a negative answer. 


Proposition 4.7. Let pi, p 2 e (0,1) be such that y x > p 2 and set a 2 = p 2 — Pi- There 
does not exist random variable £ e £>(pi, p 2 ) such that 

E [/(A')] <E [/«)], 

for all X e yf) and every f e Xidt). 


Proof. We argue by contradiction. Suppose that such a £ does exist. The previous 
Corollary implies that £ must be the random variable C, from Lemma [4.51 We now 
define a random variable C' <E B(y i, /i 2 ) as follows. Let C' be such that 


¥[C' = 0] = 




y{ + a 2 


and P 


C' = 


y\ + a 2 ) 


Pi 


iA 


yi + a 2 


If A = pi — is as in Lemma 14751 let us define the function g(x) = max{0, fzf}, 
which is clearly increasing and convex. It is easy to verify that 


Efo(C')] 


(1 - pi) 2 + a 2 


and E \g(C')\ 


Pier 2 

(p 2 + a 2 )((l-pi) 2 + u 2 )- 


If we divide the last two equations we get 


E[g(C)] = + 

E[g(C')] fii 


which contradicts the maximality of £. 


□ 


In the next section we exploit the fact that the random variable Z u from Theorem 
14.31 is stochastically smaller than a particular binomial random variable. Using 
this result, we will obtain a refined version of Theorem II .81 
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4.2 A refinement of Theorem 11.81 


We begin this section by recalling the following, well-known, result of Hoeffding 
(see d. Theorem 4). 


Theorem 4.8 (Hoeffding, 1956). Fix a positive integer s and let q\, . . . . q s be real num¬ 
bers from the interval (0,1). Let Bi ,..., B s be independent Bernoulli trials with parame¬ 
ters qi,-..,q s , respectively. Then 


P 




< P [B(s, q) > c ], when c > sq, 


where q = j q% and B(s , q) ~ Bin(s, q). 

Recall that a random variable W is stochastically smaller than a random variable V, 
if P [W > t] < P[R > t], for all t. Denote this by W < s t V. It is well known, and 
not so difficult to prove, (see |29|1 that W < st V if and only if E [f(W)] < E [f(V)], 
for every increasing function, /, for which the expectations exist. Moreover, the 
stochastic order is closed under convolutions. The following result can be found 
in Misra et al. IH 8 | . 


Theorem 4.9 (Misra, Singh, Harner, 2003). Fix m > 2 and real numbers pi > ■ ■ ■ > 
p m from the interval (0,1). Suppose that X is a random variable from B(pi ,..., p m ). Let 
Z be the random variable that takes values on the set — j with probabilities 

k m 1 m 1 1 m J r 

( 772 \ 

. J E [X 3 {1 - X) m ~’] Jorj = 0,1, •••, m. 

Then Z is stochastically smaller than the random variable, 5, that takes values on the set 
with probabilities 

L m 7 m 7 7 m J • 

P[S = j/m] = E [X m } j/m ^1 — E [X m f /m ) m - J , for j = 0,1,..., m. 

Proof. See IHl, Theorem 4.1. □ 

Notice that the random variable 5 is such that m ■ S has the distribution of a 
Bin (m. E^” 1 ] 1 /” 1 ) random variable. The following result is an analogue of Theo¬ 
rem [L8] that takes into account the additional information on the moments. 
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Theorem 4.10. Fix positive integers, n,m > 2. For i = 1,..., n let {pn,..., p im } be 
an m-tuple of real numbers such that 1 > pn > ■ ■ ■ > Pi m > 0 and for which the class 
B(pn,..., Pi m ) is non-empty. Let X 1: ..., X n be independent random variables such that 

Xi £ B(p a ,..., Pim), for i = 1 For j = 1,..., m set q 3 := \ Yl!=\ E [ X i\ 

Fix a positive integer t such that nqi < t < n. For j = 1,... ,rn — 1 let I 3 be the 
interval ( nq 3 + l,nq 3+1 + 1 ] and let I m be the interval ( nq m + 1 ,n). If t £ Ij, for some 
j = 1,... ,m, then 


P 


. £=1 


< min 

1 <s<j 


( (st-s + 1)(1 - q s ) 
\ s(st — s + 1 — nq s ) 


■ P [Bi{ns, q s ) > st 



where Bifns , q s ) ~ Bin (ns, q s ),for s = 1,..., j. 


Proof. Note that {E [ x ,] }"= i i s an increasing sequence, for all i = 1 ,... ,n. Fix 
j e {1,... ,m} and let s be such that 1 < s < j. Define the function f(x ) = 
max{0, x — t + 1 }, x > 0 and note that /(•) £ X ic (t). Since X t £ B(pi ,..., p m ) C 
B(pi,..., p s ), for all i — 1 ... ,n, Theorem 14.31 implies that 


P 


. i=l 


< E [max{0, Z ns 


t + 1}], 


where Z ns = W" Z, and each Z, takes that value for l — 0,1,..., s, with prob¬ 
ability 

P [Z t = t/s]= QEp'(l-.Y.)*-']. 

From Theorem 14.91 we know that each Z, is stochastically smaller than where 
Hj is such that s ■ 5* ~ Bin (s, E[Xf] 1/,s ), for i = 1,... ,n. Since /(•) is an increasing 
function, and the stochastic order is closed under convolutions, we conclude that 


E [max{0, Z ns — t + 1}] < E [max{0, — t + 1}] , 

where E ns = fffl \ -? is the independent sum of 5/s. Now Hj ~ t • /i,, where 
Bi ~ Bin (s, E[JYf] 1//s ), and so 

E [max{0, — t + 1}] = -E [max{0, B ns — st + s}], 


25 










On the Bemstein-Hoeffding method 


C. Pelekis, J. Ramon, Y. Wang 


where B ns = Y^i= i Since t is assumed to be an integer, we can write 


E [max{0, B ns — st + s}] = {k — st + s) ■ ¥[B ns = k] 

k=st-s -\-1 
n 

= ^ P [B^ > k]. 

k=st-s -\-1 

Since t > nq s + 1, Hoeffding's Theorem 14.81 implies that 

P [B ns > k] < P [Bi(ns, q s ) > /c], for > nsq s , 
where Bi{ns , g s ) ~ Bin (ns, g s ). Summarising, we have shown 


P 


. i=l 



P[5*(ns,g s ) > A;]. 

k=st— s+1 


Finally, we use the following estimate on binomial tails (see Feller |9|, page 151, 
formula (3.4)): 

P [Bi(ns, q s ) > k] < ^• P[Sf(ns, q s ) > /c], for A; > nsg s , 

r£ TlQs 

and the result follows. □ 


In the next section we show that the Bernstein-Hoeffding method can be adapted 
to the case in which one has information on the conditional means of the random 
variables. 


5 Mixtures 

5.1 Convex orders and mixtures 

In this section we will work with independent, bounded random variables for 
which we have information on their conditional distribution. Let us be more pre¬ 
cise after fixing some notation and definitions. Let m > 2 be a positive integer 
and let 0 = r 0 < ri < • • • < r m _i < r m = 1 be real numbers forming the partition 
{Ij}j of the interval [0,1]; where, for j = 1 ,... , m — 1 , we set I, to be the interval 
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[rj-i,rj) and I m = [r m -i,r m \. Now let B(p, {Ij,Pj}) be the class consisting of all 
random variables A G B(p) for which E [X\X G If] = /i r Formally, 

B(p, {I^nj}) := {A G B(p) : E [X\X G If = p 3 , for j = 1 

Finally, let C(p, {Ij, qj}) be the class consisting of all random variables in X G B(p) 
for which P[Ig Ij] = q 3 , i.e., 

C(p, Uj, Qj}) ■= i x e B{p) :P[I6 Ij] = qj, for j = 1,..., m}. 

Suppose that we have independent, bounded random variables for which we 
know whether they belong to one of the above classes of random variables. In 
this section we show how to employ the Bernstein-Hoeffding method in order to 
obtain bounds that take the additional information into account. In order to do 
so, we will need the notion of mixture of random variables. Recall that a mix¬ 
ture of the random variables is defined as a random selection of one of the 

Y t according to a probability distribution on the index set I. The next result is a 
mixture-analogue of Lemma l2Tl 

Lemma 5.1. Let f : [0,1] —t R be a convex function. Fix positive integer m > 2 and 
real numbers 0 = r 0 < r ± < ■ ■ ■ < r m _i < r m = 1. For j = 1,..., m — 1, let Ij be the 
interval [rj-i,rj) and let I m = [r m _i,r m ]. If X is a random variable in B{p), then there 
exists a random variable whose support is the set {r 0 , r ±,..., r m } such that E[£. Y ] = P 
andE[f(X)]<E[f(£ x )]. 

Proof Let A 3 be the event {A" G f }, j = 1 ,,m. Define Xj to be the random 
variable whose distribution is the conditional distribution of X, given A r It is 
easy to see that A is a mixture of {Aj} ? ; X) is chosen with probability P[A G If]. 
Now Xj G Ij and so Lemma 12.1 1 implies that Xj < cx Tj, where T :) is the random 
variable that takes that values Tj_\ and r 3 with probabilities nj := and 

1 — 7 Tj respectively. The required random variable can be obtained by letting e.v 
take the value 0 with probability (1 — 7Ti)IP[A G If], the value 1 with probability 
7r m P[A G I m ] and, for j = 1,... ,m — 1, the value r 3 with probability 7 TjP[A g 
Ij\ + (! _ 7Tj + i)P[A G Ij+i]. □ 

Note that the random variable fx of the previous lemma depends on the condi¬ 
tional probabilities P [X G If ], j = 1,..., m as well as on the conditional means 
E [A"|A G Ij ], j = 1... ,m. So, in case we know the conditional probabilities and 
the conditional means of the random variables, we can find random variables 
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that are larger, in the sense of convex order, than any random variable having the 
same conditional probabilities and means. Similarly, one can find a random vari¬ 
able that is larger, in the sense of convex order, than any random variable fromthe 
class B(p, { Ij, Pj}), i.e., when we know conditional means. 

Lemma 5.2. Let B(p, {Ij, //,}) be the class defined above, corresponding to a given par¬ 
tition {Ij} of the interval [0,1]. Then there exists random variable £ e B{jp) that concen¬ 
trates mass on the endpoints of the interval P and I m such that X < cx £ < cx Ber(p),for 
all X e B(p, {Ij, pj}). £ depends on the partition {Ij} and the conditional means {pj}. 

Proof Fix a random variable X e B(p, {Ij, Pj }). Let X 3 be the random variable 
whose distribution is the conditional distribution of X, given the event X 6 Ij. 
Lemma l5dl implies that there exist random variables Tj,j = 1 ,,m, that concen¬ 
trate mass on the endpoints, rj-i, r 3 , of the intervals Ij such that X 3 < cx Tj. Note 
that, by construction, E [Tj] = E [Xj\ = p 3 , for all j = 1 ... ,m. Now let (be a 
mixture of the random variables T\ and T m ; we take £ to be equal to T\ with prob¬ 
ability f rn _f’ ] and equal to T m with probability f . Note that E[£] — p. We now 
show tbat X < cx £. Fix a convex function / : [0,1] —>• M and let g : [0,1] —> K be 
the function whose graph is the line passing through the points (pi, E[/(Ti)]) and 
(p m , E[/(T m )]). Since / is convex, we have f(x) < g(x), for all x from the interval 
[ri,r m -i\. Hence 

n 

E[f(X)] = ^E[/(X,)].p[Xe/,] 

3 =1 

m— 1 

< PiE \f{Ti)] + p m E {f{T m )} + ^ Pj E [f(Xj)] 

3= 2 

m— 1 

< Pig(Tl) + Pmg(Pm) + PjE \g{Xj)} . 

3 =2 

Since g(-) is linear, we have g(pj) = g(E[Xjf) = E [g(Xj)\ ,j = 1 ,... ,m and so the 
last expression can be written as 

m— 1 

Pig(Pl) +Pmg(Pm) + E[ 9 (A',)] = E[ 9 (.Y)]. 

3= 2 

By linearity of g(-), we have E [g(X)] = g(p) = E [p(£)]. Summarising, we have 
shown that 

e[/P0]<e[ 9 «)]. 
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Once again, linearity of g(-) implies 


e [»(«)] 


and the result follows. 


^ P -E b(B,)] + ———E [g(B 1 )} 


Pm pi 
pm P 

pm pi 

Pm P 
Pm pi 

E[/R)] 


g. 


Pi 


^(Pl) + —-—^(Pr 

Hrri Hi 

P~ Hi 


E[/ffi)]) 


Hf 


Hi 


E[/(T„)]) 


□ 


The following theorem can be regarded as an improvement upon Hoeffding's in 
the case where one has additional information on the conditional means of the 
random variables. 


Theorem 5.3. Let the random variables X l} , X n be independent and such that 0 < 
X < 1. Fix positive integer m > 2 and real numbers 0 = r 0 < r 4 < • • • < r m _i < r m = 
1. For j = 1,..., m — 1, let Ij be the interval rf) and let I m = [r m _ 1; r m ). Assume 
further that for i — there is a sequence {p t j} JL\ such that E [2Q| 2Q e If = pij. 

Let pi = EpQ]. Ift is such that np < t < n, then there exist 7r 4 , i r 2 ,7 t 3 ,7t 4 e (0,1) that 
add up to 1, such that 


P 


$>>f 

i =1 


< inf e~ ht {vn + e fcri 7r 2 + e hrm - 1 7r 3 + e fcrm 7r 4 }" . 


Proof The proof is similar to the proof of Theorem 11.61 and so we only sketch it. 
Lemma 15721 implies that X t < cx f, for i = 1,..., n, where each f concentrates mass 
on the endpoints, r 0 , n, r m _i, r m , of the intervals I\ and I m . Hence, the arithmetic- 
geometric means inequality implies 


P 


J2 x i>t 


i =1 


i —1 


Set qi = tHm Pi , Si = ri Wl and Ui = — — , for f = 1,. 

^ Mim- Mil Bl-ro 1 Tm—rm-1' 

15.11 the result is obtained by setting 7Ti = I EILi Qi ' s %> ^ = 

qf ■ Ui and 7 t 4 = I ELi( 1 ~ Qi) ' (1 - «i)- 


*3 = ±E^i(1 


, n. Using Lemma 

£ E”=i ?*■(!- *), 
□ 


29 


















On the Bemstein-Hoeffding method 


C. Pelekis, J. Ramon, Y. Wang 


In case one considers random variables from the class C(p, {Ij, qj}), the random 
variable that is largest in the convex order is given by the solution of a linear pro¬ 
gram. In particular, we have the following. 

Lemma 5.4. Fix a convex and increasing function f : [0, oo) —> [0, oo). Fix a positive 
integer m > 2 and real numbers 0 = r 0 < r\ < ■ ■ ■ < r m _i < r m = 1. For j = 
1,..., m — 1, let Ij be the interval rf) and let I m = [r m _i, r m }. Assume further that 
there is a sequence {qj}f = i such that the class C(p, {Ij, qj}) is non-empty. Then there is a 
£ E B(p) such that £ < cx Ber(p) and 

E [ e hX ] < E [e h t] , for all X E C{p, {Ij, qj}), 
where h is such that e h = z'-e., it is the optimal real such that 



with B ~ Bin(n,p). The random variable £ depends on the solution of a linear program. 

Proof. Since E [£] = p, the first statement is evident by Lemma 12.11 Let X E 
C(p,{Ij,qj}) and set p 0 = E[A"|X e If) From Lemma 15.11 we know that, for i = 
1 ,... ,n, there is a random variable £x that concentrates mass on the set {r 0 ,... ,r m } 
such that takes the value r 0 = 0 with probability i r 0 := q\ — ff l , the value r rn — 1 
with probability 7r m := ^ r, "- ] and, for j = 1,... ,m — 1, the value r, with 

probability iij := q 3 + qj+i ri f 1 fffff 1 ■ Therefore, 

m— 1 

E \e Kx ] = n 0 e hro + rr m e hr ™ + ^ vr je hr f 

3=1 

which implies that E [e h ^ x ] is a linear function of {pj}- The required is obtained 
by maximising E \e h ^ x ) subject to the following linear constraints: r ? _ | < pj < pj, 
for j = 1,..., m and Y^Li TPi = P- □ 

5.2 Yet another bound for cases with known variance 

Let us, for convenience, change a bit our notation and set B(p, a 2 ) to be the class 
of random variables from B(p) whose variance is a 2 . Throughout this section we 
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will assume that a 2 is strictly positive. Hence 0 < a 2 < p(l — p). From Proposi¬ 
tion |47] we know that there does not exist £ e B(p, a 2 ) such that X < cx £, for all 
X 6 B(p, a 2 ). From Lemma l2dl we know that X < cx Ber(p), for all X E B(p, a 2 ) 
but Ber(p) does not belong to the class B(p, a 2 ), when a 2 < p( 1 —p). In Theorem l4.3l 
we have obtained, using Bernstein polynomials, a random variable Z P)Cf E B(p), 
that does not belong to B(p, a 2 ), such that X < cx Z Pt(T < cx Ber(p). In this section 
we will construct another random variable having this property More precisely, 
we will prove the following. 

Lemma 5.5. There exists a random variable f p .„ E B(p) such that 

X Ticx £p,(J —CX 


for all X E B(p, a 2 ). 

Depending on the value of p and a 2 , the random variable £ Pj(J can yield efficiently 
computable bounds that are sharper than existing, well-known, bounds. After 
stating our main results, we will provide some figures that illustrate the differ¬ 
ences between the bounds. In order to construct f [LrT we will apply Lemma [5dl to 
the partition [0, p) U [p, 1]. We will also need the following result that is interesting 
on its own. 

Lemma 5.6. Suppose that X,Y are tzvo random variables from the class B(p,a 2 ) and 
consider the partition [0,p) U [p, 1] of [0, 1]. Let £ x and £y be the associated random 
variables given by Lemma 15. 1 1 Then 

£x <cx £v f and only if P[£ x =p]> P[£y = p]. 

Proof. Assume first that £x < cx £v- Then E[/(£x)] < E[/(£y)] for the function 
/ : [0,1] —>■ [0, oo) having values /(0) = /(1) = 1, f(p) = 0 and which is linear on 
the intervals [0,p] and [p, 1], Hence 

E[/(£x)] = 1 - P[£x =P}< E[/(£y)] = 1 - P[£y = p\ 
and so P[£x = p\ > P[£y = p\- 

Assume now that P[£.v = p\ > P[£y = p\. Then for any convex function / : [0,1] — ;> 
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[0, oo), we have f(p) < (1 — p) ■ /(0) + p ■ /(1) and so 

E[/(&)] -E[/(6')] < /(0) • (P[fr = 0] - P[£y = 0]) 

+ /(1)-(P[6, = 1]-P[£ y = 1]) 

+ ((i - p) • /(o) + p • /(i)) • (P[& = p] - P[6- = p]) 

= /(0) • (E [1 - £ x ] - E [1 - £y ]) + /(l) ■ (E [fr] - E [fr]) 
= 0, 

where the last equality comes form the fact that E [£ x]=E[£y]. □ 


Proof of Lemma 15.51 Consider the class B(p, a 2 ) with 0 < a 2 < p{ 1 — p). For every 
X G B(p, cr 2 ) let X\ be the random variable whose distribution is the conditional 
distribution of X, given that X G [0, p). Let also X 2 be the random variable whose 
distribution is the conditional distribution of X, given X G [p. 1], From Lemma |5.1 1 
we know that there is a random variable such that E[A"] = E[£x] and X < cx £ x - 
Furthermore, fx is the mixture of random variables Ip and B> such that E'AV = 
E[5j] and X, < cx B ir for i = 1,2. In addition, 13\ concentrates mass on the set 
{0,p} and B 2 concentrates mass on the set {p, 1}. Assume that £x is equal to Bi 
with probability 6 X - Clearly, £ x depends on X and we now show how one can get 
rid of this dependence. Define f P: „ to be the random variable £ x , X G B(p. a 2 ) for 
which 

p C=p] = r x i y ) p [«'-=p]- 

From Lemma l5Al we have Y < ca; £ PiCr , for all Y G B(p, a 2 ). Set i\ = p — E[Ai] and 
l 2 = E[A 2 ] — p. Off course, 4, ^2 depend on X. Since p = 0 X E[Xi] + (1 — 6x) E[X 2 ], 
we can write 

0 X £ 1 = (1 - 0x)i 2 Ox = j^-r. 

<-1 + t -2 

The cases in which l\ and i 2 are both equal to zero can be excluded since they cor¬ 
respond to a constant random variable. We now proceed to find the distribution 
of £ Pj(J = ix- Using Lemma [Shi we compute 


P [& = p] 


Ox ns, = p] + (1 - 0xMB 2 = p\ 


1-* 


1 + t2 


p J u + 


1 -p 


1 - 
1 - 


1^2 


(1 -p)p(£i +£ 2 ) 

1 

(1 -p)p(1/£i + 1 /^ 2 ) ‘ 
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The last expression implies that P [£x = p] is a decreasing function of i\ and of £ 2 . 
Similarly, one can check that 


P[6r = 0] = 


'U2 


p{£\ + £2 

By the Law of total variance we have 


and P [f * = 1] = 


U2 


(1 -p){£ i + £ 2 )' 


u 2 = Var[W] = 9 x Var[X { ] + (l - d x )Var[X 2 ] + 9 X £ 2 + (1 - 9 X )£ 2 2 . 


Hence a 2 > 9 X £\ + (1 — 9 X )£$ or, equivalently, a 2 > £\£ 2 . Since P [£x = p] is a 

decreasing function of t\ and of £ 2 it follows that it attains its minimum when 
cr 2 = £i£ 2 and this, in turn, implies that 


P [£x = p] = i 


(i-p)p (£ 1 + e 2 y 


Therefore, in order to minimise P [£x = p] it is enough to solve the following opti¬ 
mization problem: 


min £i + £ 2 

h,h 

s.t. £\£ 2 = cr 2 
0 < £\ < p 
0 < £ 2 < 1 — p. 


Elementary, though quite tedious, calculations show that the optimal solution 
(t'i, £ 2 ) equals 

if cr > 1 — p 

(^2) = S (p, 7), if cr > p 

[(a, a), if cr < min{p, 1 — p}. 

Therefore, the required random variable has the following distribution: 

2 2 

- If cr > 1 — p, then £ PiC . takes the values 0, p and 1 with probability 

(1 p& ( -i?+^f ) and (i-pP+^ respectively. 

- If cr > p, then £ PiCT takes the values 0, p and 1 with probability 
and {1 _ p ^ 2+(j2 y respectively 
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- If a < min{p, 1 — p}, then takes the values 0, p and 1 with probability 
1 - 2 (T~p)p and 2 ^' respectively. 

□ 

Proof of Theorem IPOl The proof of Theorem 11.101 is an application of Lemma 15.51 
It is very similar to the proof of Theorem 11.61 and Theorem 13.41 and so we briefly 
sketch it. The first statement can be proven in the same way as Theorem II.61 The 
second statement follows from the fact that JT G { 0 , p, 2p ,..., np} and by 
looking at the smallest positive integer, m t , that is > t. As in Theorem 13.41 we 
can find an £ > 0 such that the function, </>(•), that is equal to 0 for x < e and, 
for x > e, it is a straight line passing through the points (f, f(t)) and (m t , f{m t )) 
satisfies f(f) = 1 and 


E 



< E 



for a supposedly optimal function /(•) with /(f) = 1. □ 

It is not easy to find a closed form of the bound given by Theorem 11.101 Nev¬ 
ertheless, the bound can be easilly implemented. Note that the previous bound 
concerns functions from the class T ic (t). We end this section by performing some 
pictorial comparisons between several bounds discussed in this article. Before 
doing so, let us bring to the reader's attention the following, well-known, bound 
that is due to Bennett [1]. Bennett's approach was simplified by Cohen et al. 0. 
In particular, by employing the Bernstein-Hoeffding method to the exponential 
function, Cohen et al. have shown the following. 


Theorem 5.7 (Bennett bound). Fix positive integer n and assume that we are given a 
pair (p, a 2 ) for which the class B(p, a 2 ) is non-empty. Let X { ,.... X n be independent 
random variables such that X t e B(p, a 2 ), for i = 1 ,..., n. Fix t e ( np , n). Then 


P 


>t 


< 



where a 


cr 2 + (l-p) 2 


and /3 


a 2 +(t/n-p)(l-p) 
cr 2 + (l-p) 2 


Proof. See 0. 


□ 
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(a) t = 0.30 x 20 (b) t = 0.55 x 20 (c) t = 0.80 x 20 



(d) t = 0.35 x 20 (e) t = 0.60 x 20 (f) t = 0.85 x 20 



(g) t = 0.40 x 20 (h) t = 0.65 x 20 (i) t = 0.90 x 20 



Figure 1: Comparison of Bennett's bound and the bounds of Theorems 14.31 and 
11.101 for Xi E B(p,a 2 ). The first column corresponds to p — 1/4. The second 
column corresponds to p = 1/2. Finally, the third column corresponds to the 
value p — 3/4. We set n = 20 and t — (p + e)n, for particular choices of e. The blue 
curves represent Bennett's bound; the orange curves represent the bound given 
by Theorem II .101 the green curves correspond to the bound given by Theorem 
14.31 The abscissae are the variances. 


Our numerical experiments suggest that, when a 2 is not very small, the bound 
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given by Theorem 11.101 is tighter than Bennett's bound. Note that we can also 
apply the bound given by Theorem l4.3l to random variables from the class B(p. a 2 ); 
it is not difficult to implement this bound. In order to build a concrete mental 
image let us fix the parameter p and consider random variables X t , i = 1 ,,n 
such that Xi G 23(0.5, a 2 ). In a similar way as in Proposition 13 . 71 one can show that 
it suffices to consider the infimum, in the bound of Theorem 11.101 over the set 
K := {k/2 : k is nonnegative integer and k/2 < t}. We can now put the computer 
to work to calculate the bound 



Figure |T] shows comparisons between Bennett's bound, the bound obtained in 
Theorem 14.31 and the bound of Theorem II,1Q[ The abscissae in these figures corre¬ 
spond to the variance. Notice that, when the variance is large, the bounds given 
by Theorems 14.31 and II.101 are sharper than the Bennett bound. 

In the next section we stretch a limitation of the Bernstein-Hoeffding method. 

6 Unbounded random variables 

So far we have employed the Bemstein-Hoeffding method to sums of indepen¬ 
dent and bounded random variables. The reader may wonder whether the method 
can be employed in order to obtain bounds on deviations from the expectation 
for sums of independent, non-negative and unbounded random variables. We will 
show, in this section, that in this case the method yields a bound that is the same 
as the bound given by Markov's inequality. Let us remark that this fact was al¬ 
ready known to Hoeffding (see the footnote in Ifl5ll , page 15) but we were not able 
to find a proof; we include a proof for the sake of completeness. Hence the case of 
non-negative and unbounded random variables requires different methods and 
the reader is invited to take a look at the work of Samuels Il25l , 11261 , Il27l and Feige 
151 for further details and references. The case of non-negative and unbounded 
random variables seems to be less investigated than the case of bounded random 
variables. Talagrand (see lt3T| , page 692, Comment 3) already mentions that it is 
unclear how to improve Hoeffding's inequality without the assumption that the 
random variables are bounded from above. Let us fix some notation. 
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For given p > 0, let U(n) the class of non-negative random variables whose mean 
equals /i. Formally, 

U(fi) = {X : X > 0, E[X] = /i}. 


Now, for i = 1,... ,n, fix /i, ; > 0 and X t e U{\if). If t > YhiPi* then one can 
estimate 


P 


E- y ‘ 


> t 


1=1 


< 


fit) 


E 


f[E x ‘ 


. i=l 


where /(•) is a non-negative, convex and increasing function. A crucial step in 
the Bernstein-Hoeffding method is to minimise the right hand side of the last 
inequality with respect to /(•). We may assume that we minimise over those 
functions / for which f(t ) = 1. We now show that this minimisation leads to 
a bound that is the same as Markov's. Note that Markov's inequality yields 


P Er=i W > t] < ' y j " . Recall the definition of the class T ic {t), from the Intro¬ 
duction, and let Z ic (t) be the class consisting of all functions / 6 Xi C {i) such that 
f(t ) = 1. In this section we report the following. 


Proposition 6.1. With the same notation as above, we have 


V := inf sup E 

fez ic (t) XiGU(fii) 



E n 

^ i=1 hi 
t 


Proof. For i = 1,..., n, let Kj be the random variable that takes the values 0 and t 
with probabilities 1 — y- and f-, respectively. Clearly, we have 


V > inf E 

f£Z ic (t) 



In a similar way as in Theorem 13.41 one can show that 


inf E 

fez ic (t) 



inf E 

ee[o,q 


max 



Enwy 

t — £ J 


Since Y % e {0, t, 2t ,..., nt}, a similar argument as in Proposition [3T7] shows that 
the optimal e in the right hand side of the last equation is equal to 0. Therefore, 


inf E 

sG [ 0 ,t) 


max < 0 , 


y _ 

l^ i =1 1 1 
t — £ 


and the result follows. 



□ 
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Hence, in the case of non-negative and unbounded random variables, the method 
cannot yield a bound that is better than Markov's bound. 
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